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ABSTRACT 


A brief discussion of the literature concerned with the 
two-population discrimination problem is presented and sev- 
eral procedures based on the likelihood ratio for discrim- 
ination between negative exponentially distributed ponulations 
are proposed. The small sample and asymototic performance of 
these procedures is compared with that of non-parametric 
procedures and the classical linear discriminant function. 
Some guidelines for the use of the procedures discussed are 


presented. 
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i INTRODUCTION 


The problem of classification arises when one or more 
measurements are made on an individual and one wishes to clas- 
sify the individual as belonging to one of a finite number of 
Categories on the basis of these measurements. Each category 
is characterized by a probability distribution of the measure- 
ments, but the proper category of the individual is not ob- 
servable; it must be inferred from the measurements. Thus the 
problem, in abstract terms, is: given an observation of a 
random variable arising from one of several vnopnulations, find 
a rule for deciding from which population the observation 
came. 

The classification problem is, then, one of finding an 
appropriate "statistical decision function." We have a num- 
ber of hypotheses: each hypothesis is that the distribution 
of the observation is that corresponding to a aiven popnula- 
tion, and one of these hypotheses must be selected, the 
others rejected. 

In the classification problem, there are essentiallv 
three levels of information about the distributions corre- 
sponding to the various populations which may be available 
to the statistician. 

1. the distributions may be comnletely known 

2. the distributions may be known to belong to a 

given family indexed by a parameter which is 
unknown 


3. the distributions may be completely unknown 


In cases 2) and 3), information about the value of the param- 
eter or about the unknown distribution is usually available 
from a sample or sequence of realizations of the random var- 
iable corresponding to each population. 

In the investigations reported in this thesis, the in- 
Gividual to be classified belongs to one of two populations. 
In this situation, case 1) above is equivalent to the simple 
vs. simple hypothesis testing problem whose solution is given 
by the Neyman-Pearson Lemma. Case 2) has received relatively 
little attention except under the assumption that the family 
of distributions is multi-variate normal with the same (but 
unknown) co-variance matrix. The distribution of the statis- 
tics arising in this situation have been derived. In addi- 
tion, Hoel and Peterson (5) have derived very general con- 
ditions under which procedures using sample estimates of 
the parameters are asymptotically optimal. Case 3) was 
first considered by Fix and Hodges in 195l. 

In Section II of this thesis the non-parametric vroce- 
dure proposed by Fix and Hodges (2,3) and the application of 
this procedure when the distribution of the random variables 
is negative exponential will be reviewed. A bound on the 
error probabilities of the Fix-Hodges procedure discovered 
by Cover and Hart (1) and a more general procedure proposed 
by Loftsgaarden and Quesenbury (6) will also be examined. 

Section III will present the results of a study of a 
Likelihood Ratio discrimination procedure in case 2) above 


and a comparison of the performance of the various procedures 
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considered in this thesis when the random variables have the 
univariate negative exponential distribution. In Section IV 
conclusions and recommendations arising from this studv will 


be presented. 


al 


i Rad Ba REVIEW OF LITERATURE 

Notation and Definitions 

In considering the classification problem, the following 
structure will be assumed. The two categories or povulations 
have distribution functions F and G, and without loss of 
generality, since the measures with cumulative distribution 
functions F and G are absolutely continuous with respect to 
that given by F + G, the density functions f£ and g will be 
Supposed to exist. Random samples from the two distributions 
are available: Xp rere X and Ypreces Yn independent and 
identically distributed as F and as G respectively; they may 
be used to obtain information about the respective distribu- 
tions. An observation z of the random variable Z is made, 
and the classification problem is to decide whether Z 1s 
distributed as F or as G. The abbreviation Z Vv F should be 
read "Z is distributed as F." The probabilities of misclas- 


sification will be designated as 


P, = Pr tassign Z % GiZ uv Ff} 


BD eee acon 2. CUA Uae | 
In the case that the distributions are negative exponential, 


F(x) = 1 - aa and G(y) =1l - egy 


Throughout this thesis reference will be made to discrim- 
ination procedures which tend to behave similarly in the Linke: 
that 1s as the number of sample observations unon which they 
are based grows very large. This concept may be made explicit 
by introducing two notions of consistency defined by tix and 


Hodges (2): 


a2 


Definmeaon 1: 

The sequences of decision functions cae and ee are 
Said to be consistent in the sense of performance characteris- 
Elecs wn, swiateVveCrebe ther true distributions of the random 
Vearaamkes; <fOreany 6. 0) there vexists N “so that 1f m SoNeand 
ial ct 


Jpr{a' = 6,} - Batin = Sey cre 


for every possible decision 6. 


DeriEnmetTOn —2: 

The sequences of decision functions ae and ae are 
Said to be consistent in the sense of decision functions if, 
whatever be the true distributions of the random variables, 
pOteany e 2 0, . there exists N iso. that 1 fm > nl and) niece 


Prato = we) See = eens 


It is clear that consistency in the second sense implies 
that in the first. All proofs of consistency by Fix and 
Hodges and those in this thesis provide consistency in the 


stronger sense. The modifying phrase will however be omitted. 


Discrimination when the distributions are completely known 

When the two distributions F and G are completely known, 
the problem of assigning an observation z to one of the two 
may be posed as a test of the hypothesis Z vv F against the 
alternative Z% G. In this case, the Neyman-Pearson Lemma 
gives the procedure: Assign Z as distributed according to 
Beat 


FZ) where t is to be determined 


Gatza ate Oe Sea 
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Assign Z © F with probability y if 


Otherwise assign 2% G. This procedure is optimal in that for 
any assigned probability of error "of the first kind," i.e., 
Pr{assign 2. G|Z \ F} = Pi, the probability of error "of the 


second kind," ive., Prlassigm I7n oy | Zoomer Of thus 


9! 
procedure is no greater than that of any other. The value of 
t is chosen in the classical hypothesis-testing problem so 
that the probability of error of the first kind is some 
chosen value. Since the class of Neyman-Pearson tests is 
equivalent to the class of Bayes tests, the above procedure 
(for the appropriate choice of t) is also optimal with re- 
Spect to minimizing any given weighted sum of the two error 
probabilities. 

This procedure will be designated L(t). In the case 
that F and G are negative-exponential distributions, the L(t) 
procedure is: 
Ne EOS) izes 


Assign Z2%° F if and only if 7 Giure 


Discrimination when the distributions are completely unknown 
When nothing can be assumed about the form of the distribu- 

tion corresponding to the two populations, the statistician has 

only the observations Xyreees x and ee a from whichmee 

obtain information enabling him to classify Z appropriately. 

The procedures which Fix and Hodges (2) suggest involve the 


estimation of the densities f and g at the noint of interest, 


and the use of these estimates in the likelihood ratio 
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procedure. The following theorem due to Fix and Hodges demon- 


strates the asymptotic optimality of this procedure. 


Theorem 1: Let £ and g denote estimates of the densities f 


and g respectively and let L* (t;f,q) denote the likelihood 


“Aw 


ratio discrimination procedure using f and g in place of f 
and g. df fan '?) and Diy ° 2) are consistent estimates for 


£(z) and g(z) for all z except possibly for z € N, . where 


“y 


Pp ie 


L(t). 


) =O = P(N ) then Lites g(tit is consistent with 


fg r 

The problem, then, is reduced to that of finding consis- 
tent estimates of the densities f and g. If the observation 
Space is reduced to one dimension by a non-negative trans- 
formation p, such that x, > x entails o(x,,x) + 0, and if, 
further, for each z except possibly for a null set under both 
the F and G distribution o0(X,z) and o(Y,z) are random var- 
lables with continuous densities not both zero at zero, then 
given the observation z to be classified, the observations 
Xyreee Xi Yyr---/¥, may be replaced by 9(X,,2),..-, o(X 2); 
PUY 12) yee, e(Y,2) and the discrimination involves non- 
negative univariate random variables. A consistent estimate 
of the transformed densities is given by the following theorem 


of Fix and Hodges. 


Theorem 2: Let X and Y be non-negative. Let f and g be pnos- 
itive and continuous at 0. Let k(m,n) be a positive, integer- 
valued funetion such that ktmpny - =, = Kim, > Oeand 


= k (m,n) >») OSes nos oo wen — > 6 # 0 or ~. Define 
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y= ee smallest value of the combined samples of X's 


anayy's 


Me= number Ofex. > U 


A 


N = number of Y's au 
then — is a consistent estimate for £(0) and ~. is a cons. 


tent estimate for g(0). 


The L* (t,£,g) procedure thus requires: Assign ZVFif 


and only if 


Performance of the Non-Parametric Discriminator with finite 
samples 

Fix and Hodges (3) continued the investigation of their 
non-parametric discrimination procedure by examining its per- 
formance for small samples where distributions are Normal 
with identical covariance matrix; that is, under conditions in 
which the linear discriminant function is known to be an op- 
timal procedure. The bulk of that investigation is for uni- 


Varliate distributions with k (the total number of the avai 


able samples used in the classification) equal one. This is 
the “Rule of Nearest Neighbor": classify 2 %v F if and onlv if 
z's nearest neighbor is an x. Fix and Hodges obtain the mis- 


classification probability for this procedure for a consider- 
able range of sample sizes and for distance between nonvulation 
means of 1, 2 and 3 times the standard deviation. Limiting 
error probabilities (as m= n-> ~) are obtained for k = 1 and 


k = 3 with distance between population means of 1 to 5 times 
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the standard deviation. Some results are obtained for bivar- 
iate normal distributions and an estimate of the performance 
of the discriminator for k > 3 is obtained. One very inter- 
esting result of this investigation is that, regardless of 

the underlying distributions, as m= n> ~ the two error prob- 
abilities of the rule of nearest neighbor are equal and no 
greater than one-half. 

Hager (4) investigated the performance of the "rule of 
nearest neighbor" under the assumption that F and G were neg- 
ative exponential. He contrasted this with the performance 
under the same conditions, of the linear discriminant func- 
tion and obtained misclassification probabilities for a wide 
range of (equal) sample sizes and parameter values for the 
latter procedure when F and G were Gamma distributions of 
order 1 to 20. His results in the exponential case are in- 
cluded in Section III of this thesis. 

Loftsgaarden and Quesenbury (6) provosed an alternative 
density estimator to that suggested by Fix and Hodges, which 
is consistent and applicable in a Euclidean space of any 
dimension. The procedure is let j(m) be a sequence of inte- 


gers such that 


Il 
8 


Iain). (m1 ) 


mc 


lim ee 0 
moo 


To estimate the density at a point Zz, using a sample Xpreees 
otek Woy ree W ny be the transformed sample [x -Z],---, 


|x 72 | ordered from smallest to largest. Let Bee Z denote 


ie 


the volume (Lebesgue measure) of the hypersphere of radius 


W 5) centered at z, then 


i-1 sl 
a a wera 
w(4),z 


is a consistent estimate of the density f at the point z. 

If the density g at Zz is similarly estimated based on 
YyreeerYye denoting the transformed sample by v(l),..., 
v(2),..-.,v(n) (where 2£(n) is a sequence with the same charac- 
teristics as j above), then by Theorem 1] the procedure 


uw 
b* (> £;q))which.requires, -ascrvgne: 40 vse sane@mon cian 


is ee eto 
m Aw (4) )z Sn 
si a oe nt 
. AS (2) ,z 


is consistent with the procedure L(t) and hence asymptotically 
optimal. Note that, if t = 1 andm=on, j = &, this procedure 
is identical with the Fix-Hodges procedure with k =j+&-1 
Since a majority of the k nearest neighbors of z are x's if 
and only if w(j) < v(%). In the general case, the procedures 
L* (t;£,g) and L*(tj£,g) are quite similar but not identical. 
The density estimate f has applicability to problems other 
than that of classification, while the estimate f 1s note 
versatile. 

In their paper, Loftsgaarden and Quesenbury renort a 
small empirical study of the density estimator £ when the true 
distributions are Uniform, negative exponential, and Normal. 


Based on this study, they recommend that the sequence j(n) 


iL 
take vValuess not tess than n-. 


ie: 


In an article published in 1967, Cover and Hart (1) 
evaluated the rule of nearest neighbor ina slightlv different 
context from that in which the previous investigations had 
placed it. Their work is in a BayeSian context so that there 


is a probability structure over the space {F,G} 


II 


Ny Pr{ZvF} 


Pr{Z.G} 


a2 
It is assumed also that the random sample of X's and Y's arise 
in a way so that there is one fixed sample size with the num- 
ber of X's within that sample being probabilistically deter- 
mined. 

If the classification loss function simply counts wrong 
decisions, 1.e., the loss is 0 or 1 depending on whether the 
observation to be classified is assigned correctly or incor- 
rectly; if R* designates the expected risk of the Baves proce- 
dure with respect to a given prior distribution (n,1l-n) where 
n = Pr {ZYF} and if R designates the expected risk (with re- 
Spect to the same prior distribution) of the rule of nearest 
neighbor, then the result for discrimination between two 


populations proved by Cover and Hart is given by the follow- 


an 
Theorem 3: Let the space of possible values of the random 
variables be a Separable metric space. Let f and g be such 


that, with probability one x is either 1) a continuity point 


of f and g, or 2) a point of non-zero probability measure. 


1 


Then the expected risk R of the nearest neighbor procedure 


has the bounds 
PS ee ea lala) 
These bounds are as tight as possible. 
A comparable bound is obtained for the case of discrim- 


ination among several populations. 
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Iiit. A LIKELIHOOD RATIO DISCRIMINANT 


As was noted in the last section, when the probability 
structure of the two populations to be discriminated is known 
completely the likelihood ratio criterion gives the solution 
to the classification problem: that is, classify z as dis- 


tmwemted according to F if 





£(z) Pt EOmasome Cn 0) <, ty < © 
g(z) — - - = 


The procedure which Fix and Hodges selected with which to com- 
pare the rule of nearest neighbor was the linear discriminant 
function, since that procedure is known to be ovtimal under 
the assumption that the populations under consideration are 
Normally distributed with the same covariance matrix. Inves- 
tigation of the linear discriminant reveals that it is the 
likelihood ratio procedure using the estimates of the popula- 
tion means and the common co-variance matrix as though they 
were known to be correct. Hager's investigation indicated 
that the use of the linear discriminant when the ponulations 
have the negative exponential distribution can give verv poor 
results and that, in general, the probability of misclassifi- 
cation is divided very unevenly between Py and Po. It iS net 
Surprising that the linear discriminant performs poorly on 
distributions so radically different from the Normal as the 
negative exponential. In fact, good verformance in this case 
would be quite surprising. 

In attempting to discover a parametric discrimination 
procedure with good properties, one might emulate the develop- 


ment which leads to the discriminant function and suggest that 


2. 


the random sample of the two populations be used to estimate 

the parameter of the distributions. The likelihood ratio 

procedure could then be carried out as though the estimates 

were known to be correct. This procedure which will be 

designated L(t3i,u) would then be 
m 


Let } = —"S— 4 = 


m 
) x, 
i=l 

Assign ZV Fif 


Zz t for some t 0 Ces Oa 


One may easily verify that this procedure is, indeed, 
ree ols 
asymptotically optimal. Since A > 4 and uw > u as noms 
this result follows from Theorem 4 below, or from a more 


general theorem of Hoel and Peterson (5). 


Theorem 4 (Fix and Hodges): If 


eo) the estimates ues a are consistent and 
| 4 


b) for every 6, £, (2) and Gy (2) are continuous func- 
tions of 8 for every z except perhaps for z é& N, where 


Pr(N,) = 0 under the distribution given by f, and that given 


9 
by Jar then the sequence of discrimination procedures ob- 
tained by applying the likelihood ratio principle with crit- 


teal, velvet. > 0° 7e0 f+ (Zz) and Dp (z) is consistent with 
m 


0 


ey) as 
Tt is noteworthy that the foregoing procedure (and the 
linear discriminant function as well) makes no use of the 


observation z in determining the estimates of the parameters. 


Ze 


One might suppose that the use of z for this purvose would im- 
prewe the performance of the procedure, at least for small 
sample sizes. Accordingly one could pose the problem as one 
of testing the composite hypothesis Ho: z~v F against the 
alternative Hy: z ~~. G, using the maximum likelihood estimates 


A and u in both cases so that 


AV) AV) 
\ = ae) ey = (n+l) 
i } 
X.+2Z y.tz 
te ae 
Accept Hoy ide 
~ Mm 
us a (u-A)z >t 
u 


This procedure which will be called Lltsd5u) is, of course, 
asymptotically equivalent to L(tzi,u), sO that 1t  €eo 1s %een— 
sistent with L(t) and hence optimal in the limit. 

In the discussion up to this point, the problem of the 
choice of t in the two families of procedures which have been 
proposed has not been considered. The following lemma will 


clarify the problem. 


Lemma l: If t is restricted to be a constant in the procedure 
A A NY 
L(t;A,u) or L(t;A,u) as A, u range over the parameter space, 


then if t #7 1, as m,n + ~ for any € > O there exists 6 so that 


eee - _ 

ie | ba Oe peg eke 2 ee aoe 

Proof: The procedure L(ts\,u) requires: assign Z Vv F iff 
Tay oI >t 
u 


2 


or 


(u-’) z 


| Vv 


gn + Qn t 


= >(=> 


Let m,n > ™ so that A > A, u > u and suppose (without loss of 


yenerality) that »X < uw. Then the procedure assigns ZV G in- 


Correctly if andsonsyo1 











vl 
oe aes , &n_t 
LnmA UA 
Now suppose t > 1; since 
gn E 
= ‘ = r dn t) 
Bae eee lassign 2% CG) a vn = Pala = 
u 
gn = 
Se Bare ee 
ee a eee 
r r 
tee 
-.- (thy 2 
=1- (5) 
the desired inequality is achieved if 
ees 
(EE) eae 
ou 
me 
r 
uly cL 
NE te 
since, by assumption, 5 > 1 and -t > 1, there exists ¢ -mUmee 


that if - - 1 < 6 the inequality above is satisfied and the 
desired conclusion follows. If t < 1a similar argument shows 
that for appropriate values of +, Po > =e. jsince L(trdou) 
is asymptotically equivalent to Teen) the result follows 


for the former procedure as well. 
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It is noteworthy that for t = las m,n~+o 





peg - 

Bim P,. = lim 1 - expi 7 

Srl+ yt ae 
= I oh See 


and similarly 
lim Po, =e 
Srlt+ 


(The subscripts on the P's are reversed as Sol” ) 


In fact, it is easily verified that, for t = 1asm,n>»- 

ie 4 < . < 2 either Pi ele Pa is greater than one-half. This 
is not a desirable situation; however, it is better than the 
Situation which obtains in the use of the linear discriminant 
function where, as Hager discovered, for .3863 = [2(%n2) - 1] 
< ~ < 1/[2(gn2) - 1] = 2.589,P, > & or P, > %. Recall that 
the rule of nearest neighbor has both error probabilities 
bounded above by % as m,n + © irrespective of the vnopulation 
distributions. 

The above results are asymptotic and imply little about 
the performance of the procedures for small samples. They do, 
however, Sharpen the problem which must be faced in using the 
Cea procedure. Either t is fixed at 1 (for if t # 1 the 
procedure may become arbitrarily bad as m,n + ~) or t is made 
a function of the observations. If the latter course is elec- 
ted, one might be interested in preventing the possibility of 
misclassifying an observation with higher probability than 


one-half. A plausible way to pursue this goal would be to 
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seek a minimax procedure; i.e., one which would make Py equal 
to P54. To do this one would, given the estimates A,u seek 


co= t(X,n) 66 .thae 


Cae 
Ppt: e 


= > 
= >> 
0) 
V 
ct 
——s 


< t} = Pata: 


and use this value of t for the discrimination. The perform- 
ance of this "minimax" procedure is reported in this thesis. 

In the foregoing material, the ratio = has occurred fre- 
guently. It would be desirable for a discrimination procedure 
to depend on the parameters of the distributions only through 
this ratio. Indeed this is true, for both L(tsi,u) and 


6 Woe © 
Wy Gem aro ae 


Theorem 5: In the procedures L(t7\A,u) and L(t;),u), 


Pa Pr{assign Z ~ G/Z2 ~ F} depends on i, only through c = x. 
A lemma will be established first: 
Lemma 2: If X has the negative exponential distribution with 


parameter i, then X is distributed as (-%n U)/idA where U has 
the+Una form (0,1) distribution. 


Proof of Lemma: 





Prix = j= P(x) emu 

pri = <x} = Pri On Ua) 
Sea One any 
ae a Ax 


The result follows by the Caratheodory extension theorem. 
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Proof of Theorem 5 


ay 


Suppose n=me= 1. Then for the procedure L(t3A,u) 


Pe lassiqnezZ » Giz ~ F} = pr (exp |(¢-2) 2| 58 | 74 


ll 


i ba V name: ¢2£n_ W) 
Pris Ci exp | (4 V &n 0 d |< t 











Qn V a GZ dt 


fait - ) (-2n w)] < oe 


= Pric In Vv in U 


where U, V, and W are independent and identically 
uniformly distributed on (0,1) by lemma 2. 


Ngo Ny 
Suminkaxcly for LigtsN a), 











e Y+Z 2 Z 
Pr{assign Zo G|z an F} = Pris exp ez = wz)? | < t|Z a F } 
Yn Vi tm W 
aoe u r exp 2 2 (-2n W) 
Qn U,2£n W gn U, &n W Sir Y¥ vn A <— t 
r r r r u r a 


c&8n V+2n W 2 Z 
rf tn Uten w °*P Ge U+2n W c&n V+tn 3) (—£n | ; ; 


Me result for arbitrary m,n follows by indtiction. 


Note that P. = Pridssiqm 2 © filme.) 


ras 


pri exp[ (1-2) 2] Seat | Za} 
u 


ray 


Prix exp[(i-u)Z] < 1/tl/z v G} 


is equal to Py for the situation in which i and u have been 


interchanged and t replaced by 1l/t, i.e., Po for L(tsA,u) equals 
AN OA Bia UAW. 
Py fer L(1/t-u,4). A stmilar statement is valid for L(t;A,uy: 
In seeking the error probabilities of the procedure 


ite) andi LAtt vl) cone must calculate 


2, 


P, = Pix e See oe 
_ de ‘le nN OA 
= P{%n = - fn = + (U-rA)Z < Qn t(Z v F} 
Uu 


where in procedure L(t;)A,u), x Gamma (A,m), > Gamma (une 


vu m+l 
z v Gamma (\,1), and in procedure L(t;)\,u), ae = U + Z where 


U ~™ Gamma (A,m), Z v Gamma (A,1) so that U + Z Vv Gamma (A,mtl), 


ares = V + Z where V Vv Gamma (u,n). In the L(t;A,u) procedure, 


rf t 1s a constant, it appears that Py should be calculable 

by a straightforward triple numeral integration. In the 
rae procedure the boundary of the region of integration 
for Z involves the solution of a transcendental equation, but 
this too may be done numerically and Py calculated for fixed 

t. However, when t is permitted to be a function of the obser- 
vations, the integral becomes intractable. For this reason, 
and because the investigator wished to compare the performance 
of the Likelihood Ratio procedures to that of the Loftsgqaarden- 
Quesenbury procedure which is almost impossible to assess 
analytically, the decision was made to conduct this investiga- 
tion through a Monte-Carlo study. The following procedures 
were investigated 
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The computer program, run on an IBM 360 computer, gener- 
ated, by means of the probability integral transform, the ran- 
dom sample of X's and Y's, and the observation Z to be clas- 
sified. The various classification procedures were performed 
and correct or incorrect classification of z was recorded. 

The Monte Carlo procedure may be viewed as an attempt to 
estimate the parameter p of a Bernoulli random variable; i.e., 
the probability with which a randomly selected observation will 
be misclassified. As such, the distribution of the estimates 
which have been obtained may be estimated. Since p is reason- 
able close to one-half in all cases, and since 10,000 replica- 
tions of the Monte Carlo procedure were summed, it may be 


assumed that. the estimate 
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where B. = "US With, probability j¢laoe, I waehenobabalsty so, 
has approximately the Normal distribution with mean p and var- 
lance pti < BLO x ioreen Hence a 95% confidence interval 


may be formed for the value of p in each case 


.95 = Pr{|p-p| 1.960} 
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For comparison with these results, the analytically com- 
puted misclassification probabilities of the rule of nearest 
neighbor and linear discriminant function obtained by Hager 


are reproduced. 
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Table 1 


Misclassiftication Brror Probabilities eror Proceamace 


Description of Procedures: 
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Ze GES Ly) £=1 
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Cc) rey io) "minimax" 
4. Ltt Ayn) "minimax" 
5 ua Wier f, g) t = 1 "Rule of Nearest Neighbor" 
ore Poe Ge £, a t = 1 Loftsgaarden and Quesenbury Procedure 
j(n) = &(n) = n® 
Wes "Rule of Nearest Neighbor" - from Hager (4) 
Be Linear Discriminant Function ~ from Hager (4) 
Ce 
N = size of sample from each population upon which classifica- 


tion procedure is based 
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IV. SUMMARY AND CONCLUSIONS 


A number of interesting facts are evident from inspnec- 
tion of the results of the investigations conducted in this 
thesis. Perhaps the most startling is that for values of 
c not greater than 5 and all sample sizes un to and includ- 
ing 20 the expected risk (with prior (%,%) ) of the linear 
discriminant function is uniformly smaller than that for 
either of the non-parametric vrocedures (see Figure 1). The 
linear discriminant is equivalent to procedure L(t3i,u) Willen 
t chosen in a somewhat bizarre fashion, since it divides the 
positive line into two intervals which are acceptance re- 
gions for {Z ~ F} and {Z ~% G}. Hence the linear discrim- 
inant minimizes P, for the Py which it achieves, and though 
the division of the total error probability is very uneven, 
the average 1S small enough to better the non-parametric 
procedures. 

Also interesting is the fact that the exvected risks of 
procedures L(ts\,p) and Baan are almost identical even 
for very small sample sizes. In general P 


mn um 
tee Vimeibein. bor Ie. 71): buee 


1 is larger for 

D for the latter procedure 
is smaller so as to keep the average almost constant. The 
"minimax" procedure appears to achieve the desired equaliza- 
Elon. .OF Py and P. fairly well for moderate samvle sizes 

(n> 10), but farls quite badly for ni = 1:or 2. “Lt appears 


Enatye FOr > Ste average Yisky 1s Mou anereased japovec tase, 


by using the "minimax" procedure. 
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Figure “1. 


Expected Risk vs. c for various procedures; n = 2, © 
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The negligible imorovement in the performance of the 
likelihood ratio discriminant procedures for sample sizes in 
excess of 10 and the extremely slow approach to optimality 
of the Loftsgaarden and Quesenbury procedure are also inter- 
esting. An example of this for c = 10 is shown in Figure 2. 


The considerable disparity of the values of P, and P 


i 2 
for many of the procedures considered in this thesis raises 
an interesting philosophical voint which an investigator 
should settle for himself before selecting one of these meth- 
ods for use. If, for example, one is willing to accept the 
possibility that a large percentage of the members of one 
population will be misclassified, although the averaaqe num- 
ber of misclassifications is apt to be moderate, then the 
use of the linear discriminant function may be preferable to 
the use of the non-parametric procedures (unless c is verv 
large). If, however, one is reassured by the fact that the 
rule of nearest neighbor makes errors no more than half the 
time (asymptotically) no matter what the situation, one may 
have a predilection for that procedure. The superiority in 
terms of expected risk of the linear discriminant function 
over the non-parametric procedures for small c is shown in 
Figure 1 where, for examole for n = 2, c = 5 the linear dis- 
criminant has expected risk about .03 lower than the rule of 
nearest neighbor; for n = ~, c = 5 the difference is almost 
.06. In fact, the performance of the linear discriminant 
where=c < 2 1s almost identical with *thae Gfrine bests prece. 


Ay TY 
dure in this range, L(1l;\,u). However, reference to Figure 
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(2) Procedure 
number 


aS 


Figure 3 


For Selected Procedures; 
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Cc 


20 
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3 indicates that in the same cases, P. for the linear discrim- 
inant is much greater than that for the rule of nearest neigh- 
bor. Also apparent in Figure 3 is the non-monotonicity of P. 


for several procedures. Table 1 gives both P, and P., for all 


il Z 
cases considered in this thesis so that expected risks for 
mixing probabilities other than (%,%) may be easily calcula- 
ted. 

The following recommendations seem appropriate based on 
this study. If one can be reasonably certain that the pop- 
ulations are negative exoonential, and there is no reason to 
Suppose that the unknown obServation is more likelv to be 
from one of the populations than from the other, the minimax 
version Of Ce) (Procedure 3) would be a good choice if 
N= 59. Porssmaller Semples the same procedure witier aa 
(Procedure 1) seems better. If observations from one of the 
populations are appreciably more likely than those from the 
other, a procedure taking this fact into account by taking 
more observations from the more likely ponulation and/or 
estimating the probability of occurrence of the ponulations 
(if these probabilities are not known) should be considered. 
A selection of the parameter t in the chosen procedure in 
order to minimize the expected risk with respect to the 
estimated (or known) population probabilities could then be 
made. Because the probability of classification error does 
not decrease appreciably as n increases from ten to infinity 
for the likelihood ratio procedures, it annears that the use 


of samples larger than ten in Procedures 1 - 4 1S unwarranted 
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unless the cost of sampling is very small. If one cannot be 
certain that the populations are negative exponential, a 
choice between linear discriminant and a non-parametric pro- 
cedure may be appropriate. The attitude of the exnerimenter 
toward the importance of Py and Po individually should in- 


Phuemnee ls CeciervOne In this Case. 
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